Automatically Annotating the ODP Web Taxonomy
نویسندگان
چکیده
In this paper we present the ideas and algorithms developed around our KeyGen Web Taxonomy Annotation engine. KeyGen annotates the Open Directory Project, also known as Dmoz, with meaningful and previously unknown keywords by utilizing domain knowledge extracted from the WWW. We present two algorithms: i) The PageParse Algorithm, which efficiently extracts keywords from Web Taxonomies using a combination of local and global scores, and ii) the Support Algorithm, an I/O optimized algorithm for coalescing hierarchies of keywords. We then present the results: i) from constructing a richly annotated ODP Web taxonomy and ii) from evaluating the correctness of this structure by performing an automated classification of Web-pages.
منابع مشابه
Open Directory Project based universal taxonomy for Personalization of Online (Re)sources
Content personalization reflects the ability of content classification into (predefined) thematic units or information domains. Content nodes in a single thematic unit are related to a greater or lesser extent. An existing connection between two available content nodes assumes that the user will be interested in both resources (but not necessarily to the same extent). Such a connection (and its...
متن کاملUtilizing global and path information with language modelling for hierarchical text classification
Hierarchical text classification of a Web taxonomy is challenging because it is a very large-scale problem with hundreds of thousand categories and associated documents. Furthermore, the conceptual levels and training data availabilities of categories vary widely. The narrow-down approach is the state-of-the-art that utilizes a search engine for generating candidates from the taxonomy and build...
متن کاملAutomatic Topic Ontology Construction Using Semantic Relations from WordNet and Wikipedia
Due to the explosive growth of web technology, a huge amount of information is available as web resources over the Internet. Therefore, in order to access the relevant content from the web resources effectively, considerable attention is paid on the semantic web for efficient knowledge sharing and interoperability. Topic ontology is a hierarchy of a set of topics that are interconnected using s...
متن کاملAutomatic Topic Ontology Construction Using Semantic Relations from WordNet and Wikipedia
Due to the explosive growth of web technology, a huge amount of information is available as web resources over the Internet. Therefore, in order to access the relevant content from the web resources effectively, considerable attention is paid on the semantic web for efficient knowledge sharing and interoperability. Topic ontology is a hierarchy of a set of topics that are interconnected using s...
متن کاملOntorat: automatic generation of new ontology terms, annotations, and axioms based on ontology design patterns
BACKGROUND It is time-consuming to build an ontology with many terms and axioms. Thus it is desired to automate the process of ontology development. Ontology Design Patterns (ODPs) provide a reusable solution to solve a recurrent modeling problem in the context of ontology engineering. Because ontology terms often follow specific ODPs, the Ontology for Biomedical Investigations (OBI) developers...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007